Processor Mapping Techniques Toward E cient DataRedistribution Edgar

نویسنده

  • Edgar T. Kalns
چکیده

Run-time data redistribution can enhance algorithm performance in distributed-memory machines. Explicit redistribution of data can be performed between algorithm phases when a diierent data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories. In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CYCLIC(c) (or vice-versa) redistributions of arbitrary number of dimensions. Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern. When implemented on an IBM SP-x, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method aaords greater exibility in specifying precisely which data elements are redistributed and which elements remain on-processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Media Processor for Multimediasignal Processing

{ A multimedia system requires processing capabilities that include controlling functions as well as high throughput. For the consumer market, this processor must also satisfy low cost constraints. An eecient solution is the use of a dual{issue RISC architecture with key enhancements to target the high computation needs of multimedia applications. The RISC design methodology ensures its ease of...

متن کامل

Efficient VLSI Implementation of Iterative Solutions to Sparse Linear Systems

We propose a novel way of solving systems of linear equations with sparse coe cient matrices using iterative methods on a VLSI array. The nonzero entries of the coe cient matrix are mapped onto a processor array of size pe pe, where e is the number of nonzero elements, n is the number of equations and e n. The data transport problem that arises because of this mapping is solved using an e cient...

متن کامل

Retargetable Compilers for Embedded DSPs

Programmable devices are a key technology for the design of embedded systems, such as in the consumer electronics market. Processor cores are used as building blocks for more and more embedded system designs, since they provide a unique combination of features: exibility and reusability. Processor-based design implies that compilers capable of generating e cient machine code are necessary. Howe...

متن کامل

A novel 32 bit RISC architecture unifying RISC and DSP

A novel 32 bit RISC architecture is presented which is the basis of a powerful general purpose microprocessor and in parallel a 16/32 bit xed point DSP processor. This unifying of RISC and DSP was not achieved by simply using a microprocessor and DSP core, but a new concept for the implementation of DSP processors has been developed. With the architecture presented it has been proven that a DSP...

متن کامل

Phase-Coupled Mapping of Data Flow Graphs to Irregular Data Paths

Many software compilers for embedded processors produce machine code of insu cient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable the use of high-level language compilers also for embedded software, new code generation and optimiz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995